Basic Principles for Segmenting Thai EDUs
نویسندگان
چکیده
This paper proposes a guideline to determine Thai elementary discourse units (EDUs) based on rhetorical structure theory. Carson and Marcu’s (2001) guideline for segmenting English EDUs is modified to propose a suitable guideline for segmenting EDUs in Thai. The proposed principles are used in tagging EDUs for constructing a corpus of discourse tree structures. It can also be used as the basis for implementing automatic Thai EDU segmentation. The problems of determining Thai EDUs both manually and automatically are also explored and discussed in
منابع مشابه
Thai Rhetorical Structure Analysis
Rhetorical structure analysis (RSA) explores discourse relations among elementary discourse units (EDUs) in a text. It is very useful in many text processing tasks employing relationships among EDUs such as text understanding, summarization, and question-answering. Thai language with its distinctive linguistic characteristics requires a unique technique. This article proposes an approach for Th...
متن کاملA Statistical and Rule-based Method for Chunking Verbal Units in Thai Texts
Tokenizing a text into a sequence of words is an important process towards text interpretation. This process is required in many applications such as text summarization, semantic search, and machine translation. Instead of splitting into words, recently there have been works on chunking into units which are larger than words. Text chunking is a process to divide a running text into non-overlapp...
متن کاملMining Causality from Texts for Question Answering System
This research aims to develop automatic knowledge mining of causality from texts for supporting an automatic question answering system (QA) in answering ’why’ question, which is among the most crucial forms of questions. The out come of this research will assist people in diagnosing problems, such as in plant diseases, health, industrial and etc. While the previous works have extracted causalit...
متن کاملSyllable-Based Thai-English Machine Transliteration
This article describes the first trial on bidirectional Thai-English machine transliteration applied on the NEWS 2010 transliteration corpus. The system relies on segmenting sourcelanguage words into syllable-like units, finding unit's pronunciations, consulting a syllable transliteration table to form target-language word hypotheses, and ranking the hypotheses by using syllable n-gram. The app...
متن کاملReferent resolution for zero pronouns in Thai
1. Introduction Resolving zero pronouns is a major problem in developing a natural language understanding (NLU) system for Thai. Since subject and object pronouns in Thai can be omitted from a sentence, an NLU system must be capable of identifying the missing subjects or objects in the sentence. This process of identifying referents for zero pronouns, which is a part of referent resolution 1 pr...
متن کامل